Identifying Anomalies

Identify anomalies in the given dataset to further understand the process.

In anomaly detection systems, we usually want to identify if we have an anomaly right now, and send an alert.

To identify if the last data point is an anomaly, we start by calculating the mean and standard deviation for each status code in the past hour:

To get the last value in a GROUP BY and the mean and standard deviation, we used a little array trick.

Next, we calculate the z-score for the last value for each status code:

We calculated the z-score by finding the number of standard deviations between the last value and the mean. To avoid a “division by zero” error, we transform the denominator to NULL.

Looking at the z-scores we got, we can spot that status code 400 received a very high z-score of 6. In the past minute, we returned a 400 status code 24 times, which is significantly higher than the average of 0.73 in the past hour.

Let’s take a look at the raw data:

It does look like in the last couple of minutes, and we are getting more errors than expected.

400 status code entries
400 status code entries

What our naked eye missed in the chart and the raw data was found by the query and was classified as an anomaly. We are off to a great start!

Preparing the Data
Quiz 2
Mark as Completed
Report an Issue